1. Comparing Means

To start with, let’s warm up with a simple one-way ANOVA model. This example, from Whitlock and Schluter chapter 15 question 22 looks at the mass of lodgepole pinecones from different habitats.

1.1. Load and plot the data. Choose a plot that not only shows the raw data, but also the means and SE or CI of those means. +1 EC if Michael thinks it’s fancy.

#read in data
pines <- read.csv(file="https://www.zoology.ubc.ca/~whitlock/ABD/teaching/datasets/15/15q22LodgepolePineCones.csv")

#what's here?
#str(pines)

#plot cone mass
ggplot(pines, aes(x=habitat, y=conemass, fill=habitat))+
  geom_violin(trim=FALSE,alpha=0.8)+
  geom_point(position= position_jitterdodge())+
  geom_boxplot(fill="white", width=.15, alpha=0.8)+
  scale_fill_manual(values=c("forestgreen", "saddlebrown", "yellowgreen"))+
  labs(x="habitat", y="cone mass", title="Lodgepole pine cone mass by habitat")+
  theme_minimal()

1.2 Fit a model using least squares and evaluate all relevant assumptions. List them out as you test them. Can we use this model? If not, fix it. But if we can, no fix is needed!

pines_lm <- lm(conemass~habitat, data=pines)

# check assumptions
plot(pines_lm, which = 1) #qqplot follows a straight line- it has normal dist

plot(pines_lm, which = 2) # no clear relationship between fitted and residuals (equal variance of errors)

shapiro.test(residuals(pines_lm)) #passes residual normality test
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(pines_lm)
## W = 0.93363, p-value = 0.2781
plot(pines_lm, which = 4) #no strong outliers

plot(pines_lm, which = 5)

#add residuals to our data frame
pines <- pines %>%
  modelr::add_residuals(pines_lm)

#plotting residuals by group with car
ggplot(pines,
       aes(x = habitat, y = resid)) +
  geom_boxplot()

#check for HOV - looks good
car::leveneTest(pines_lm)
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  2  0.2148 0.8095
##       13
#evaluate model! 
anova(pines_lm)
## Analysis of Variance Table
## 
## Response: conemass
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## habitat    2 29.404 14.7020  50.085 7.787e-07 ***
## Residuals 13  3.816  0.2935                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# car::Anova
car::Anova(pines_lm, Type="II")
## Anova Table (Type II tests)
## 
## Response: conemass
##           Sum Sq Df F value    Pr(>F)    
## habitat   29.404  2  50.085 7.787e-07 ***
## Residuals  3.816 13                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The model meets all the assumptions

1.2 How much variation is explained by your model?

#see R2
summary(pines_lm)
## 
## Call:
## lm(formula = conemass ~ habitat, data = pines)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -0.780 -0.405 -0.040  0.505  0.720 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               8.9000     0.2212  40.238 4.97e-15 ***
## habitatisland.present    -2.8200     0.3281  -8.596 1.01e-06 ***
## habitatmainland.present  -2.7800     0.3281  -8.474 1.18e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5418 on 13 degrees of freedom
## Multiple R-squared:  0.8851, Adjusted R-squared:  0.8675 
## F-statistic: 50.09 on 2 and 13 DF,  p-value: 7.787e-07

The adjusted R-squared value is 0.86, so the model explains 0.86 of the variation

1.3 Show which means are different from each other. Are you correcting p-values? If so, how, and justify your choice.

#contrast means between habitats
pines_em <- emmeans(pines_lm, ~habitat)

contrast(pines_em,
         method = "tukey", adjust="bonferroni") 
##  contrast                          estimate    SE df t.ratio p.value
##  island.absent - island.present        2.82 0.328 13  8.596  <.0001 
##  island.absent - mainland.present      2.78 0.328 13  8.474  <.0001 
##  island.present - mainland.present    -0.04 0.343 13 -0.117  1.0000 
## 
## P value adjustment: bonferroni method for 3 tests
#island.absent - island.present and island.absent - mainland.present were significantly different

Yes, I added a bonferroni correction to lower the chance of false discovery arising from multiple comparisons at once. This correction may be conservative but since there are only three comparisons being made it probably won’t make much of a difference.

2. Comparing Means from Multiple Categories

2.1. Load and plot the data. We are interested in change in percent cover. Choose a plot that not only shows the raw data, but also the means and SE or CI of those means. +1 EC if Michael thinks it’s fancy.

#read in data
cover <- read.csv(file="fouling_transplant_data.csv") %>% 
  janitor::clean_names()
#str(cover)

#plot change in percent cover for caged/open and position on block
ggplot(data=cover, 
       mapping=aes(y=change_in_cover, x=caged, fill=position_on_block)) +
  geom_boxplot(width=0.9, position=position_dodge(width=1))+
  scale_fill_manual(values=c("coral1","seagreen2"))+
  labs(y="change in percent cover", x= NULL, fill="position on block")+
  theme_bw()

2.2 Fit a model using likelihood and evaluate all relevant assumptions. Do you meet assumptions?

cover_glm <- glm(change_in_cover~caged*position_on_block, data=cover, family = gaussian(link="identity"))

# check assumptions
plot(cover_glm, which = 1) #bad

plot(cover_glm, which = 2) #bad

shapiro.test(residuals(cover_glm)) #bad
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(cover_glm)
## W = 0.91664, p-value = 0.01687
plot(cover_glm, which = 4)

#add residuals to our data frame
cover <- cover %>%
  modelr::add_residuals(cover_glm)

#plotting residuals by group -- unequal variance
ggplot(cover,
       aes(x = caged, y = resid)) +
  geom_boxplot()

ggplot(cover,
       aes(x = position_on_block, y = resid)) +
  geom_boxplot()

ggplot(cover,
       aes(x = caged, y = resid, fill= position_on_block)) +
  geom_boxplot()

#check for HOV
car::leveneTest(cover_glm)
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)  
## group  3  2.7642 0.0605 .
##       28                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2.3 If you answered yes to the above…. you are wrong. It doesn’t! Percentage data is weird. Difference in percentages can be ever weirder! There are three tried and true solutions here. But they MIGHT not all work.

Incorporate initial cover as a covariate. This takes out that influence, and as such we’re looking at residuals of change. This sometimes, but not always, works.

Divide change by initial cover to express change as percent change relative to initial cover.

Calculate difference in logit cover (so, logist(initial cover) - logit(final cover)). Logit transformations linearize percent cover data, and are often all that is needed to work percent cover into a linear model. You can use car::logit() for this.

Try all three methods. Which one works so that you can produce valid inference?

#1 - initial cover plus change
cover1 <- cover %>% 
  mutate(initial_plus_change=change_in_cover+initial_cover)

cover_glm_1 <- glm(initial_plus_change~caged*position_on_block, data=cover1, family = gaussian(link="identity"))
summary(cover_glm)
## 
## Call:
## glm(formula = change_in_cover ~ caged * position_on_block, family = gaussian(link = "identity"), 
##     data = cover)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -31.750   -2.812   -0.750    4.406   17.250  
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)                        2.375      3.466   0.685    0.499
## cagedOpen                         -2.250      4.902  -0.459    0.650
## position_on_blockSide             -4.250      4.902  -0.867    0.393
## cagedOpen:position_on_blockSide   -9.125      6.932  -1.316    0.199
## 
## (Dispersion parameter for gaussian family taken to be 96.11161)
## 
##     Null deviance: 3850.2  on 31  degrees of freedom
## Residual deviance: 2691.1  on 28  degrees of freedom
## AIC: 242.64
## 
## Number of Fisher Scoring iterations: 2
#check assumptions
plot(cover_glm_1, which = 1) 

plot(cover_glm_1, which = 2) # not great, better than 2

plot(cover_glm_1, which = 3)

plot(cover_glm_1, which = 4)

#2 - add relative percent change to df
cover2 <- cover %>% 
  mutate(percent_change_relative=change_in_cover/initial_cover)

cover_glm_2 <- glm(percent_change_relative~caged*position_on_block, data=cover2, family = gaussian(link="identity"))
summary(cover_glm)
## 
## Call:
## glm(formula = change_in_cover ~ caged * position_on_block, family = gaussian(link = "identity"), 
##     data = cover)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -31.750   -2.812   -0.750    4.406   17.250  
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)                        2.375      3.466   0.685    0.499
## cagedOpen                         -2.250      4.902  -0.459    0.650
## position_on_blockSide             -4.250      4.902  -0.867    0.393
## cagedOpen:position_on_blockSide   -9.125      6.932  -1.316    0.199
## 
## (Dispersion parameter for gaussian family taken to be 96.11161)
## 
##     Null deviance: 3850.2  on 31  degrees of freedom
## Residual deviance: 2691.1  on 28  degrees of freedom
## AIC: 242.64
## 
## Number of Fisher Scoring iterations: 2
#check assumptions
plot(cover_glm_2, which = 1) # variance is not equal

plot(cover_glm_2, which = 2) # diverges from normal line

plot(cover_glm_2, which = 3)

plot(cover_glm_2, which = 4)

#3 - add dif_logit cover to df
cover3 <- cover2 %>% 
  mutate(dif_logit_cover=logit(initial_cover) - logit(final_cover))

cover_glm_3 <- glm(dif_logit_cover~caged*position_on_block, data=cover3, family = gaussian(link="identity"))
summary(cover_glm)
## 
## Call:
## glm(formula = change_in_cover ~ caged * position_on_block, family = gaussian(link = "identity"), 
##     data = cover)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -31.750   -2.812   -0.750    4.406   17.250  
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)                        2.375      3.466   0.685    0.499
## cagedOpen                         -2.250      4.902  -0.459    0.650
## position_on_blockSide             -4.250      4.902  -0.867    0.393
## cagedOpen:position_on_blockSide   -9.125      6.932  -1.316    0.199
## 
## (Dispersion parameter for gaussian family taken to be 96.11161)
## 
##     Null deviance: 3850.2  on 31  degrees of freedom
## Residual deviance: 2691.1  on 28  degrees of freedom
## AIC: 242.64
## 
## Number of Fisher Scoring iterations: 2
#assumptions
plot(cover_glm_3, which = 1) # horrible

plot(cover_glm_3, which = 2) # horrible

plot(cover_glm_3, which = 3)

plot(cover_glm_3, which = 4) 

#going with model 1 because it best meets assumptions

The first model (initial cover + change in cover) works best to produce valid inference and violates the fewest assumptions.

2.4 Great! So, take us home! Using NHST with an alpha of 0.08 (why not), what does this fit model tell you about whether predation matters given how I have described the system? Feel free to replot the data or fit model results if helpful

summary(cover_glm) #nothing is significant
## 
## Call:
## glm(formula = change_in_cover ~ caged * position_on_block, family = gaussian(link = "identity"), 
##     data = cover)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -31.750   -2.812   -0.750    4.406   17.250  
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)                        2.375      3.466   0.685    0.499
## cagedOpen                         -2.250      4.902  -0.459    0.650
## position_on_blockSide             -4.250      4.902  -0.867    0.393
## cagedOpen:position_on_blockSide   -9.125      6.932  -1.316    0.199
## 
## (Dispersion parameter for gaussian family taken to be 96.11161)
## 
##     Null deviance: 3850.2  on 31  degrees of freedom
## Residual deviance: 2691.1  on 28  degrees of freedom
## AIC: 242.64
## 
## Number of Fisher Scoring iterations: 2
#re-plot data using initial_plus_change cover
ggplot(data=cover1, 
       mapping=aes(y=initial_plus_change, x=caged, fill=position_on_block)) +
  geom_boxplot(width=0.9, position=position_dodge(width=1))+
  scale_fill_manual(values=c("coral1","seagreen2"))+
  labs(y="change in percent cover", x= NULL, fill="position on block")+
  theme_bw()

3. Comparing Means with Covariates

We will wrap up with a model mixing continuous and discrete variables. In this dataset from Scantlebury et al, the authors explored how caste and mass affected the energy level of naked mole rats.

3.1 OK, you know what you are about at this point. Load in the data, plot it, fit it, check assumptions. Use Bayes for this.

#read in data
rats <- read.csv(file="https://www.zoology.ubc.ca/~whitlock/ABD/teaching/datasets/18/18e4MoleRatLayabouts.csv")

#str(rats)
#check distrubution
hist(rats$lnenergy)

shapiro.test(rats$lnenergy)
## 
##  Shapiro-Wilk normality test
## 
## data:  rats$lnenergy
## W = 0.96564, p-value = 0.3351
#check distrubution
hist(rats$lnmass)

shapiro.test(rats$lnmass)
## 
##  Shapiro-Wilk normality test
## 
## data:  rats$lnmass
## W = 0.97214, p-value = 0.5048
#lnenergy ~ caste
ggplot(rats, aes(x=caste, y=lnenergy, color=caste))+
  geom_point(position=position_jitter(width=0.2))

#lnenergy ~ lnmass
ggplot(rats, aes(x=lnmass, y=lnenergy))+
  geom_point()

#interaction
rat_brm <- brm(lnenergy~lnmass*caste, data=rats, family=gaussian(link="identity"))
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DBOOST_NO_AUTO_PTR  -include '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/usr/local/include   -fPIC  -Wall -g -O2  -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:16: error: expected ';' after top level declarator
## namespace Eigen {
##                ^
##                ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
##          ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
## 
## SAMPLING FOR MODEL '84d9b2b17e16707f0a63a60a1d3611d5' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 1.8e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.18 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.17609 seconds (Warm-up)
## Chain 1:                0.155489 seconds (Sampling)
## Chain 1:                0.331579 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '84d9b2b17e16707f0a63a60a1d3611d5' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 7e-06 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.198819 seconds (Warm-up)
## Chain 2:                0.187757 seconds (Sampling)
## Chain 2:                0.386576 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL '84d9b2b17e16707f0a63a60a1d3611d5' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 9e-06 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.181257 seconds (Warm-up)
## Chain 3:                0.178499 seconds (Sampling)
## Chain 3:                0.359756 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL '84d9b2b17e16707f0a63a60a1d3611d5' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 9e-06 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.166577 seconds (Warm-up)
## Chain 4:                0.201983 seconds (Sampling)
## Chain 4:                0.36856 seconds (Total)
## Chain 4:
#additive 
rat_brm2 <- brm(lnenergy~lnmass+caste, data=rats, family=gaussian(link="identity"))
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DBOOST_NO_AUTO_PTR  -include '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/usr/local/include   -fPIC  -Wall -g -O2  -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:16: error: expected ';' after top level declarator
## namespace Eigen {
##                ^
##                ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
##          ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
## 
## SAMPLING FOR MODEL '84d9b2b17e16707f0a63a60a1d3611d5' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 2e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.2 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.020919 seconds (Warm-up)
## Chain 1:                0.021419 seconds (Sampling)
## Chain 1:                0.042338 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '84d9b2b17e16707f0a63a60a1d3611d5' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 8e-06 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.020386 seconds (Warm-up)
## Chain 2:                0.019787 seconds (Sampling)
## Chain 2:                0.040173 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL '84d9b2b17e16707f0a63a60a1d3611d5' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 8e-06 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.020054 seconds (Warm-up)
## Chain 3:                0.01925 seconds (Sampling)
## Chain 3:                0.039304 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL '84d9b2b17e16707f0a63a60a1d3611d5' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 7e-06 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.019565 seconds (Warm-up)
## Chain 4:                0.018494 seconds (Sampling)
## Chain 4:                0.038059 seconds (Total)
## Chain 4:
# visually investigate our chains - they all match up

plot(rat_brm) 

plot(rat_brm2) 

#look at diagnostic of convergence - Gelman_Rubin statistic- Rhat - looks good

rhat(rat_brm)
##          b_Intercept             b_lnmass        b_casteworker 
##             1.005667             1.005663             1.005654 
## b_lnmass:casteworker                sigma                 lp__ 
##             1.005607             1.000632             1.002645
rhat(rat_brm2)
##   b_Intercept      b_lnmass b_casteworker         sigma          lp__ 
##     0.9998248     0.9997842     1.0001903     0.9993894     0.9998061
rhat(rat_brm) %>% mcmc_rhat()

rhat(rat_brm2) %>% mcmc_rhat()

#assess autocorrelation - looks good

mcmc_acf(rat_brm) 

mcmc_acf(rat_brm2) 

#check the match between out data and our chains for dist of y - looks alright
pp_check(rat_brm, "dens_overlay")

pp_check(rat_brm2, "dens_overlay")

#is our error normal?
pp_check(rat_brm, "error_hist", bins=10) #sort of normal

pp_check(rat_brm2, "error_hist", bins=10) #sort of normal

#see fitted vs residuals

rat_res <- residuals(rat_brm) %>%  #first get residuals
  as_tibble #make df

rat_fit <- fitted(rat_brm) %>% #then get fitted values
  as_tibble

plot(y=rat_res$Estimate, x=rat_fit$Estimate) # looks good

rat_res2 <- residuals(rat_brm2) %>%  #first get residuals
  as_tibble #make df

rat_fit2 <- fitted(rat_brm2) %>% #then get fitted values
  as_tibble

plot(y=rat_res2$Estimate, x=rat_fit2$Estimate) # looks good

3.2 Examine whether there is an interaction or not using LOO cross-validation. Is a model with an interaction more predictive?

#interaction

seal_loo <- loo(rat_brm) #Leave one out inference
seal_loo
## 
## Computed from 4000 by 35 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -10.2 4.1
## p_loo         4.2 1.0
## looic        20.5 8.2
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
#additive - more predictive!

seal_loo <- loo(rat_brm2) #Leave one out inference
seal_loo
## 
## Computed from 4000 by 35 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo     -9.7 4.2
## p_loo         3.6 1.0
## looic        19.4 8.5
## ------
## Monte Carlo SE of elpd_loo is 0.0.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.

The model without the interaction is more predictive

3.3 Compare the two castes energy expendeture at the meanlevel of log mass. Are they different? How would you discuss your conclusions.

rat_em <- emmeans(rat_brm2, ~caste)
contrast(rat_em,
         method = "tukey", adjust="bonferroni")
##  contrast      estimate lower.HPD upper.HPD
##  lazy - worker   -0.387    -0.675   -0.0934
## 
## Point estimate displayed: median 
## HPD interval probability: 0.95
contrast(emmeans(rat_brm2, ~caste), method="tukey") %>% 
  plot()+geom_vline(xintercept=0)

Yes, they are different. Lazy caste has a .95 interval that does not cross 0, so it is quite different from worker caste (estimated energy is lower)

3.4 Plot the fit model. Use tidybayes and ggdist with your model to show fit and credible intervals with the raw data points on top. modelr::data.grid() might help as well.

#gather coefficient draws
rat_coef_draws <- gather_draws(rat_brm2,
                               b_Intercept,
                               b_lnmass,
                               b_casteworker,
                               sigma)

#use stat_halfeye to plot with credible interval!
ggplot(rat_coef_draws,
       mapping=aes(x=.value))+
  facet_wrap(~.variable, scale="free")+
  stat_halfeye()

Extra credit: submitted via GitHub.